You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2025-05-14 11:08:52.610 PDTmv: preserving times for '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./Solo.out/GeneFull_ExonOverIntron/raw.h5ad': No space left on device2025-05-14 11:08:52.610 PDTmv: failed to close '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./Solo.out/GeneFull_ExonOverIntron/raw.h5ad': No space left on device2025-05-14 11:08:52.835 PDTmv: failed to close '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./Solo.out/GeneFull_ExonOverIntron/Summary.csv': No space left on device2025-05-14 11:08:53.027 PDTmv: preserving times for '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./Solo.out/GeneFull_Ex50pAS/raw.h5ad': No space left on device2025-05-14 11:08:53.027 PDTmv: failed to close '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./Solo.out/GeneFull_Ex50pAS/raw.h5ad': No space left on device2025-05-14 11:08:53.287 PDTmv: failed to close '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./Solo.out/Gene/Summary.csv': No space left on device2025-05-14 11:08:53.746 PDTmv: failed to close '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./versions.yml': No space left on device2025-05-14 11:08:58.815 PDTTask task/nf-13150368-174723-238fac2a-1e92-4c510-group0-0/0/0 runnable 0 exited with status 02025-05-14 11:08:58.815 PDTTask task/nf-13150368-174723-238fac2a-1e92-4c510-group0-0/0/0 background runnables all exited on their own.2025-05-14 11:08:58.815 PDTTask task/nf-13150368-174723-238fac2a-1e92-4c510-group0-0/0/0 succeeded
I'm using:
process {
shell = ['/bin/bash', '-euo', 'pipefail']
}
... so I don't see why the mv commands are not causing the job to fail due to the No space left on device errors.
I'm guessing that the No space errors are due to a lack of disk space. My process:
process STAR_map {
label "STAR_env"
cpus Math.min(params.cpus_max, 20)
memory { 40.GB * (1 + 0.5 * task.attempt) }
time { 4.h + (4 * task.attempt).h }
disk {
def read1_size_gb = fastq_read1.size() / 1024 ** 3
def read2_size_gb = fastq_read2.size() / 1024 ** 3
def size_gb = (read1_size_gb + read2_size_gb) * 10
def ssd_count = Math.max(1, Math.ceil(size_gb / 375)).intValue()
println "${meta.id} => size_gb: ${size_gb.round(1)}; ssd_count: ${ssd_count}"
[request: (375 * ssd_count * task.attempt).GB, type: "local-ssd"]
}
input:
tuple val(meta), path(fastq_read1), path(fastq_read2), path(genome_dir)
each path(star_par_file)
output:
path "*", emit: all // all files/folders will be saved into $outdir/STAR directory
tuple val(meta), path("${meta.id}_Aligned.toTranscriptome.out.bam"), emit: trbam // Transcritome alignments will be passed to Salmon (or other transcript quantification)
tuple val(meta), path("${meta.id}_Aligned.sortedByCoord.out.bam"), emit: bam // Genome alignments, sorted by coordinate by STAR
tuple val(meta), path("${meta.id}_ReadsPerGene.out.tab"), emit: reads_per_gene // for multiqc
tuple val(meta), path("Log.final.out"), emit: log_final // STAR log file
tuple val(meta), path("Solo.out"), emit: solo // Solo output directory
path "versions.yml", emit: versions
script:
"""
STAR ${params.extra_pars_star} \\
--runThreadN ${task.cpus} \\
--parametersFiles ${star_par_file} \\
--genomeDir ${genome_dir} \\
--readFilesIn ${fastq_read1} ${fastq_read2} \\
--outSAMattrRGline ID:${meta.id} SM:${meta.id} PL:ILLUMINA \\
--soloStrand ${meta.strandedness} \\
2>&1 | tee ${task.process}_${meta.id}.log
# remove temporary STAR files (sometimes they are not removed by STAR)
rm -rf _STARtmp
echo "# Compressing the output for the sake of scanpy" | tee -a ${task.process}_${meta.id}.log
find Solo.out -type f -name "*.mtx" | xargs -P ${task.cpus} gzip
find Solo.out -type f -name "*.tsv" | xargs -P ${task.cpus} gzip
echo "# Converting solo output to h5ad" | tee -a ${task.process}_${meta.id}.log
mtx-to-h5ad.py --output-dir Solo.out --sample "${meta.id}" Solo.out
# rename Aligned.toTranscriptome.out.bam by adding the sample name
mv Aligned.toTranscriptome.out.bam ${meta.id}_Aligned.toTranscriptome.out.bam
mv Aligned.sortedByCoord.out.bam ${meta.id}_Aligned.sortedByCoord.out.bam
mv ReadsPerGene.out.tab ${meta.id}_ReadsPerGene.out.tab
cat <<-END_VERSIONS > versions.yml
"${task.process}":
star: \$(STAR --version | sed -e "s/STAR_//g")
END_VERSIONS
"""
}
...but the zero-exit GCP batch jobs are problematic for troubleshooting the issue.
Expected behavior and actual behavior
Nextflow does not seem to be handling all bash jobs error correctly so that GCP Batch jobs exit with non-zero values.
I've included du -sh commands in my STAR_map process, and the total file size is ~100 GB after STAR mapping and subsequent steps; however, the job still throws mv: failed to close errors such as:
2025-05-14 11:08:53.287 PDT
mv: failed to close '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./Solo.out/Gene/Summary.csv': No space left on device
2025-05-14 11:08:53.746 PDT
mv: failed to close '/mnt/disks/arc-genomics-nextflow/work/13/150368f33208ba6fef5ff688caf8c7/./versions.yml': No space left on device
2025-05-14 11:08:58.815 PDT
even when I provide 9 local SSDs (375 * 9 = 3375 GB).
At least in some cases, increasing google.batch.bootDiskSize to 100-200 GB can fix the issue.
However, all GCP Batch jobs will then use a large boot disk, even though only specific processes (e.g., STAR_map in my case) actually need the larger boot disk
Bug report
An example output for my failed GCP Batch jobs:
I'm using:
... so I don't see why the
mv
commands are not causing the job to fail due to theNo space left on device
errors.I'm guessing that the
No space
errors are due to a lack of disk space. My process:...but the zero-exit GCP batch jobs are problematic for troubleshooting the issue.
Expected behavior and actual behavior
Nextflow does not seem to be handling all bash jobs error correctly so that GCP Batch jobs exit with non-zero values.
Steps to reproduce the problem
See above
Program output
The relevant log:
Environment
Additional context
See https://nfcore.slack.com/archives/C02T98A23U7/p1736629652539469?thread_ts=1667836041.736919&cid=C02T98A23U7 for more context
The text was updated successfully, but these errors were encountered: